Learning of dialogue states and language model of spoken information system

ABSTRACT

In this invention dialogue states for a dialogue model are created using a training corpus of example human—human dialogues. Dialogue states are modelled at the turn level rather than at the move level, and the dialogue states are derived from the training corpus. The range of operator dialogue utterances is actually quite small in many services and therefore may be categorized into a set of predetermined meanings. This is an important assumption which is not true of general conversation, but is often true of conversations between telephone operators and people. Phrases are specified which have specific substitution and deletion penalties, for example the two phrases “I would like to” and “can I” may be specified as a possible substitution with low or zero penalty. Thus allows common equivalent phrases are given low substitution penalties. Insignificant phrases such as ‘erm’ are given low or zero deletion penalties.

This application is the US national phase of international applicationPCT/GB00/04904 filed 19 Dec. 2000 which designated the U.S.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to the automatic classification of sequences ofsymbols, in particular of sequences of words for use in the productionof a dialogue model, in particular to the production of a dialogue modelfor natural language automated call routing systems. This invention alsorelates to the generation of an insignificant symbol set and of anequivalent symbol sequence pair set for use in such automaticclassification.

2. Detailed Description of Related Art

In a call routing service utilising a human operator, user requests maybe categorised into 4 types. An explicit user request is where the userknows the service which is required, for example “Could you put methrough to directory enquiries please?”. An implicit user request iswhere the user does not explicitly name the service required, forexample “Can I have the number for . . . please?”. A general problemdescription is where the customer does not know which service theyrequire, but expects the operator to be able to help. The operatorgenerally engages in a dialogue in order to identify the requiredservice. The final category is ‘other’ where there is confusion aboutthe problem, or what the service can do.

Automated call routing can be achieved by the use of a touch tone menuin an interactive voice response (IVR) system. It is widely acceptedthat these systems can be difficult to use, and much skill is needed inthe design of suitable voice menu prompts. Even designs usingbest-practice have several fundamental weaknesses. In particular, themapping from system function to user action (pressing a key) is usuallycompletely arbitrary and therefore difficult to remember. To alleviatethis problem, menus must be kept very short, which can lead to complexhierarchical menu structures which are difficult to navigate. Inaddition, many users have significant difficulty in mapping theirrequirements onto one of the listed system options. Touch tone IVRsystems can be effective for explicit user requests, may sometimes workwith implicit user requests, but are inappropriate for general problemdescriptions or confused users.

Spoken menu systems are the natural extension of touch tone IVR systemswhich use speech recognition technology. Their main advantages are areduction in the prompt length, and a direct relationship betweenmeaning and action—for example saying the word ‘operator’ rather thanpressing an arbitrary key. However, many of the limitations of touchtone systems remain: the difficulty of mapping customer requirementsonto the menu options, and a strictly hierarchical navigation structure.There is also the added difficulty of non-perfect speech recognitionperformance, and the consequent need for error recovery strategies.

Word spotting can be used in a system which accepts a natural languageutterance from a user. For some applications word spotting is a usefulapproach to task identification. However some tasks, for example linetest requests are characterised by high frequencies of problemspecification, so it is difficult if not impossible to determine thetask which is required using word spotting techniques.

The use of advanced topic identification techniques to categorisegeneral problem descriptions in an automated natural language callsteering system is the subject of ongoing research, for example, theautomated service described by A. L. Gorin et al in “How May I Help You”Proc of IVTTA, pp57-60, Basking Ridge, September 1996, usesautomatically acquired salient phrase fragments for call classification.In contrast, other studies either do not consider this type of requestat all, or attempt to exclude them from automatic identification.

In the above reference automated service, a classifier is trained usinga set of speech utterances which are categorised as being directed toones of a set of predetermined set of tasks. The problem which thisprior art system is that the tasks need to be predetermined, and in thiscase are defined to be the operator action resulting from the entireinteraction. The relationship between the required action, and theoperator dialogue necessary to determine the action is not easilydiscovered. In a manual call routing system there are often multipledialogue turns before an operator action occurs. It is desirable for anautomated natural language call steering system to behave in a similarway to a manually operated call steering system for at least a subset ofoperator supplied services. In order to do this it is necessary to havea dialogue model which can deal with a range of different styles ofenquiries.

BRIEF DESCRIPTION OF THE INVENTION

According to one aspect of the present invention there is provided amethod of classifying a plurality of sequences of symbols to form aplurality of sets of sequences of symbols comprising the steps ofdetermining a distance between each sequence and each other sequence insaid plurality of sequences in dependence upon a set of insignificantsymbol sequences and a set of equivalent symbol sequence pairs; andgrouping the plurality of sequences into a plurality of sets independence upon said distances.

Preferably the symbols are words transcribed from operator speechsignals generated during an enquiry to a call centre. The words may betranscribed from operator speech signals using a speaker dependentspeech recogniser.

According to a second aspect of the invention there is also provided amethod of generating a set of insignificant symbol sequences for use inthe method of the first aspect of this invention, comprising the stepsof classifying a plurality of sequences of symbols into a plurality ofsets; for each of the sets, determining an optimal alignment betweeneach sequence thereof and each other sequence in that set; andallocating a symbol or sequence of symbols having been deleted to obtainan optimal alignment between two sequences of a set.

According to a third aspect of the invention there is provided a methodof generating a set of equivalent symbol sequence pairs for use in themethod of the first aspect of this invention, comprising the steps ofclassifying a plurality of sequences of symbols into a plurality ofsets; determining an optimal alignment between each sequence in a setand each other sequence in that set; and allocating a pair of symbols orsequences of symbols to the set of equivalent symbol sequences, thesymbols or sequences of symbols having been substituted for each otherto obtain an optimal alignment between two sequences of a set.

A method of generating a grammar for enquiries made to a call centre,using the plurality of sets of sequences of words generated according tothe first aspect of the present invention comprising the steps oftranscribing a plurality of enquiries according to which of the sets thesequences of words in the enquiry occur; and generating a grammar independence upon the resulting transcription is also provided.

A method of measuring the occurrence of particular types of telephoneenquiry received in a call centre using the plurality of subsets ofsequences of words generated according to the method of the first aspectof the invention is also provided.

Apparatus for performing the methods of the invention are also provided.

BRIEF DESCRIPTION OF THE DRAWINGS

An embodiment of the invention will now be described, by way of exampleonly, with reference to the accompanying drawings in which:

FIG. 1 is a schematic representation of a computer loaded with softwareembodying the present invention;

FIG. 2 shows a known architecture of a natural language system;

FIG. 3 represents part of a simple dialogue structure for an operatorinteraction;

FIG. 4 shows the architecture of a dialogue discovery tool;

FIG. 5 is a flow chart showing the operation of the dialogue discoverytool of FIG. 4; and

FIG. 6 is a flow chart showing the operation of a clustering algorithmof FIG. 5.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS OF THE INVENTION

FIG. 1 illustrates a conventional computer 101, such as a PersonalComputer, generally referred to as a PC, running a conventionaloperating system 103, such as Windows (a Registered Trade Mark ofMicrosoft Corporation), and having a number of resident applicationprograms 105 such as a word processing program, a network browser ande-mail program or a database management program. The computer 101 alsohas suite of programs 109, 109′, 109″, 122 and 123 for use with aplurality of sequences of words (also described as sentences)transcribed from operator utterances in a call centre. The suiteincludes a dialogue state discovery program 109 that enables thesequences to be classified to form a plurality of sets of sequences.Programs 109′ and 109″ respectively allow a set of insignificant wordsand word sequences, and a set of equivalent word sequence pairs to begenerated for use by the program 109. Program 122 uses the output ofprogram 109 to generate a grammar for transcribed calls and program 123uses the output of program 109 to measure statistics about the types ofcalls which are being handled in the call centre.

The computer 101 is connected to a conventional disc storage unit 111for storing data and programs, a keyboard 113 and mouse 115 for allowinguser input and a printer 117 and display unit 119 for providing outputfrom the computer 101. The computer 101 also has access to externalnetworks (not shown) via a network card 121.

FIG. 2 shows a known architecture of a natural language call steeringsystem. A user's speech utterance is received by a speech recogniser 10.The received speech utterance is analysed by the recogniser 10 withreference to a language model 22. The language model 22 representssequences of words or sub-words which can be recognised by therecogniser 10 and the probability of these sequences occurring. Therecogniser 10 analyses the received speech utterance and provides as anoutput a graph which represents sequences of words or sub-words whichmost closely resemble the received speech utterance. Recognition resultsare expected to be very error prone, and certain words or phrases willbe much more important to the meaning of the input utterance thatothers. Thus, confidence values associated with each word in the outputgraph are also provided. The confidence values give a measure related tothe likelihood that the associated word has been correctly recognised bythe recogniser 10. The output graph including the confidence measuresare received by a classifier 6, which classifies the received graphaccording to a predefined set of meanings, with reference to a semanticmodel 20 to form a semantic classification. The semantic classificationcomprises a vector of likelihoods, each likelihood relating to aparticular one of the meanings. A dialogue manager 4 operates using astate based representation scheme as will be described more fully laterwith reference to FIG. 3. The dialogue manager 4 uses the semanticclassification vector and information about the current dialogue statetogether with information from a dialogue model 18 to instruct a messagegenerator 8 to generate a message, which is spoken to the user via aspeech synthesiser 12. The message generator 8 uses information from amessage model 14 to construct appropriate messages. The speechsynthesiser uses a speech unit database 16 which contains speech unitsrepresenting a particular voice.

Analysis of human—human operator service calls show nearly half ofcallers specify a problem, they do not request a particular service;approximately one fifth ask the operator to do something but do notactually use an explicit service name; approximately a third explicitlyask for a particular service; and 2% speak outside the domain of theservice offered (e.g. obscene calls).

After 10,000 calls have been received a new word is still observed inone in every four calls, therefore the language model 22 has to be ableto deal with previously unseen words. Callers are very disfluent,‘uhms’, ‘ers’ and restarts of words are common, therefore recognitionaccuracy is likely to be poor. The distribution of certain request typesis very skewed. Some, for example problems getting through, are verycommon. A large proportion of calls are relatively simple to resolveonce the problem/request has been correctly identified. Therefore,although the language used by the user to describe problems may becomplex, a fairly crude set of predetermined meanings may suffice toidentify and correctly deal with a large proportion of callers.

In dialogue modelling, ‘games theory’ is often used to describeconversations. A brief description of games theory follows, so that theterminology used in the following description may be understood. Gamestheory suggests that human—human conversations can be broken down intospecific games which are played out by the participants, eachparticipant taking ‘turns’ in the dialogue. These games are made up of anumber of moves, and multiple dialogue moves may be made in a singledialogue turn. For example ‘reverse charge, thank-you, to which code andnumber?’ is a single turn comprising two moves. Games played out arespecific to a task. Games are considered to obey a stack based model,i.e. once one game is complete then the parent game is returned tounless a new child game is simultaneously initiated in its place.

The dialogue manager 4 interfaces to external systems 2 (for example, acomputer telephony integration link for call control or customer recordsdatabase). The dialogue manager 4 controls transitions from and todialogue states. In known systems dialogue states are usually selectedby the designer and usually relate to a specific question or a specificstatement, which are known as dialogue moves when the games theory, asdescribed above, is applied to dialogue analysis.

In this invention the dialogue model is trained using a training corpusof example human—human dialogues. Dialogue states are modelled at theturn level rather than at the move level, and the dialogue states arederived from the training corpus.

FIG. 3 represents part of a simple dialogue structure for an operatorinteraction, represented as a tree grammar. Arcs 24 represent customerturns (which have not been annotated), and nodes 26 represent operatorturns (which have been annotated with the operator utterance). The toppath for example represents the instance where a customer has reported afault on a line, the operator apologises, and asks which code andnumber, and then echoes the required number back to the user. In thisportion, the symbol n represents any number or the word ‘double’.

The assumption underlying this style of representation is that the rangeof operator dialogue moves and turns is actually quite small in manyservices and therefore may be categorised into a set of predeterminedmeanings. This is an important assumption which is not true of generalconversation, but is often true of conversations between telephoneoperators and people.

FIG. 4 shows a dialogue discovery tool 30. The dialogue discovery tool30 uses a world knowledge database 32 which contains information such aslists of town names, surnames and ways of saying dates and times. Alocal knowledge database 34 is used by the dialogue discovery tool 30 ingenerating a semantic model 36 suitable for use in the natural languagecall steering system of FIG. 2. During use the dialogue discovery tool30 adds information to the local knowledge database 34 according to dataread from a corpus 38 of call examples.

The operation of the dialogue discovery tool 30 will now be described inmore detail with reference to FIG. 5. The dialogue discovery tool 30aims to discover the operator dialogue turns which have the samedialogue function as far as the caller is concerned. For example ‘sorryabout that, which code and number is that?’ and ‘I'm sorry, what codeand number is that please?’ have the same dialogue function. Also in theexample of FIG. 3 blocks of numbers of particular sizes are consideredto have the same dialogue function regardless of the specific numbersinvolved.

FIG. 5 shows diagrammatically the process of generating data for thelocal knowledge database 34 and the semantic model 36. The corpus 38 isseparated into a supervised training corpus 42 and an unsupervisedtraining corpus 44. Each sentence in each corpus is assumed to comprisea sequence of tokens (also referred to as words in this specification)separated by white space. Each token comprises a sequence of characters.Initially at step 40 world knowledge data from the world knowledgedatabase 32 is used to identify classes in the training corpus. Theseclasses may be represented by context free grammar rules definingmembers of the class—for example, all town names may be listed andmapped to a single token as it is regarded that all town names performthe same dialogue function. A dynamic programming (DP) match is thenperformed at step 46. The DP match aligns each sentence with each othersentence by optimally substituting tokens for each other and/or deletingtokens as will be described in more detail below. The DP match uses anylocal knowledge in the local knowledge database 34 which has been storedpreviously. The sentences in the supervised training corpus 42 areclustered using a clustering algorithm at step 48. The clusteringalgorithm used in this embodiment of the invention will be describedlater with reference to FIG. 6. The clustering algorithm producesclusters of sentences which are regarded as having the same dialoguefunction, and one ‘cluster’ for sentences which are not similar to anyof the other sentences. The clusters thus generated are manually checkedat step 50. The words which have been deleted in forming a cluster arestored in the local knowledge database 34 as representing insignificantwords or phrases. The words or phrases which have been substituted foreach other in forming a cluster are stored in the local knowledgedatabase 34 as representing synonymous words or phrases. Data stored inthe local knowledge database 34 and the world knowledge data base 32 arethen used by a DP match process at step 52 to form dialogue states usingthe unsupervised training corpus 44. The unsupervised training corpusmay include sentences from the supervised training corpus 42.

The training corpus 38 comprises operator utterances. The corpus iscreated by listening to operator utterances and transcribing the wordsmanually. It is also possible to train a speaker dependent speechrecogniser for one or more operators and to automatically transcribeoperator utterances using the speaker dependent speech recogniser. Theadvantage of this approach is that the database can be createdautomatically from a very large number of calls, for example, all theoperator calls in an entire week could be used. The disadvantage is thatthe transcriptions are likely to be less accurate than if they weregenerated manually.

The DP match algorithm performed at steps 46 and 52 in FIG. 5 will nowbe described in detail, and some examples given. The DP match algorithmis used to align two sentences. The algorithm uses a standard DPalignment with a fixed general penalty for single insertions, deletionsand substitutions. The alignment is symmetrical, i.e. deletions andinsertions are treated as the same cost. For this reason, only deletionsare mentioned.

In addition to the fixed general penalty for deletion and substitution,any number of specific substitutions and deletions may be specifiedalong with their specific penalties. These specific substitution anddeletion penalties may apply to sequences of tokens, for example the twophrases ‘I would like to’ and ‘can I’ may be specified as a possiblesubstitution with low or zero penalty. This allows common equivalentphrases to be given lower substitution penalties than DP alignment usingthe fixed general penalty would assign them. The use of specificpenalties also allows for insignificant phrases, e.g. ‘erm’, to be givenlow or zero deletion penalties.

In addition to being able to use particular substitution and deletionpenalties particular substitutions and deletions and their associatedpenalties, which were necessary in order to obtain the alignment whichresulted in the lowest total penalty are determined. These penalties maythen be stored in the local knowledge database 34 and used in anotheriteration of the DP match. Without modification, this would give exactlythe same result as the first iteration. However, if these specificpenalties are reduced, the alignment will be biased towards deleting orsubstituting these particular tokens or sequences of tokens.

Assume two sentences are represented by S_(x) and S_(y). At the startand the end of each sentence an additional token ‘#’ is appended as asentence boundary marker. L_(x) and L_(y) represent the length(including sentence boundary markers) of sentences S_(x) and S_(y).w_(i)^(x), stands for the i'th word in sentence S_(x) indexed from thezero'th word. Hence:w ₀ ^(x)=‘#’andw _((L) _(x) ⁻¹⁾ ^(x)=‘#’

We are going to populate an L_(x) by L_(y) array d, starting withd(0,0), such that the element d(L_(x)−1,L_(y)−1) of the array will givethe minimum distance D(S_(x),S_(y).) which represents the lowestpossible cumulative penalty for aligning S_(x) and S_(y).

The definition of d is recursive.d(0,0)=0.d(i,j)=min[O(i,j),P(i,j), Q(i,j)]

Where the functions O(i,j), P(i,j) and Q(i,j) each represent a possiblecontribution due to penalties for deletion of tokens in S_(x), penaltiesfor deletion of tokens in S_(y) and penalties for substitution of tokensbetween S_(x), and S_(y) respectively. A minimum of these in turn givesthe minimum distance at point d(i,j).

For the a general DP match O(i,j), P(i,j) and Q(i,j) are defined asfollows:

For two words w_(i) ^(x), w_(j) ^(y)c(w _(i) ^(x) ,w _(j) ^(y))=0 if w _(i) ^(x) =w _(j) ^(y)otherwisec(w _(i) ^(x) ,w _(j) ^(y))=1andO(i,j)=(d(i−1,j)+A) for (i>0) else O(i,j)=∞P(i,j)=(d(i,j−1)+A) for (j>0) else P(i,j)=∞Q(i,j)=(d(i−1, j−1)+B.c(w _(i) ^(x) ,w _(j) ^(y))) for (j>0,i>0) elseQ(i,j)=∞

Where A=general deletion penalty and B=general substitution penalty.

It has been found that a normalised distance is useful when comparingsentences of different lengths. The maximum possible cost m(L_(x),L_(y))between two sentences of length L_(x),L_(y) ism(L _(x) ,L _(y))=A.abs(L _(x) −L _(y))+B.min(L _(x)−2,L _(y)−2)if 2A>B otherwisem(L _(x) ,L _(y))=A.(L _(x) +L _(y)−4)

The normalised cost N(S_(x),S_(y).) is${N( {S_{x},S_{y}} )} = \frac{D( {S_{x},S_{y}} )}{m( {L_{x},L_{y}} )}$

The DP match is extended in this invention to include specificpenalties. Specific penalties are defined as follows for certainsubstitutions or deletions of tokens or sequences of tokens. Thesespecific penalties are stored in the local knowledge database 34. Takingthe case of deletions first, the deletion penalty p (w_(a)w_(b) . . .w_(N)) giving the penalty of deleting the arbitrary token sequencew_(a),w_(b) . . . w_(N) isp(w _(a) w _(b) . . . w _(N))=valuewhere value is defined in a look-up table. If value has not beenspecified in the look-up table then the general penalties apply:

-   -   p(w _(a))=A (for only one token deleted)        otherwise    -   p(w _(a) w _(b) . . . w _(N))=∞ (for deletion of sequences of        tokens)

Similarly, for specific substitution penalties, let the substitutionpenalty q(v_(a)v_(b) . . . v_(N) w_(a)w_(b) . . . w_(M)) giving the costof substituting an arbitrary word sequence v_(a)v_(b) . . . v_(N) withanother arbitrary word sequence w_(a)w_(b) . . . w_(M) or vice versa bedefined as:q(v _(a) v _(b) . . . v _(N) ,w _(a) w _(b) . . . w _(N))=valuewhere value may be defined in a look-up table. If value has not beenspecified in the look-up table then the general substitution penaltiesapply:

-   -   q(v_(a),w_(a))=B.c(v_(a), w_(a)) (for substitution of a single        token with a single token)        otherwise    -   q(v_(a)v_(b) . . . v_(N),w_(a)w_(b) . . . w_(N))=∞ (for        substitutions of a sequence of token with a sequence of tokens)

The functions O(i,j), P(i,j) and Q(i,j) are re-defined as follows:$\begin{matrix}{{O( {i,j} )} = {\min\lbrack {}_{k = {{0..i} - 1}}( {{d( {{i - k - 1},j} )} + {p( {w_{i - k}^{x}\quad\ldots\quad w_{i}^{x}} )}} ) \rbrack}} & {{{for}\quad( {i > 0} )}\quad} \\ & {{{else}\quad{O( {i,j} )}} = \infty} \\{{P( {i,j} )} = {\min\lbrack {}_{i = {{0..j} - 1}}( {{d( {i,{j - l - 1}} )} + {p( {w_{j - 1}^{y}\quad\ldots\quad w_{j}^{y}} )}} ) \rbrack}} & {{for}\quad( {j > 0} )} \\ & {{{else}\quad{P( {i,j} )}} = \infty} \\{{Q( {i,j} )} = {\min\lbrack {}_{k = {{0..j} - 1}}( {}_{l = {{0..j} - 1}}( {{d( {{i - k - 1},{j - l - 1}} )} +}   }} & {{for}\quad( {{j > 0},{i > 0}} )} \\   {q( {{w_{i - k}^{x}\quad\ldots\quad w_{i}^{x}},{w_{j - 1}^{y}\quad\ldots\quad w_{j}^{y}}} )} ) ) \rbrack & {{{{else}\quad Q( {i,j} )} = \infty}\quad}\end{matrix}$

The above equations are equivalent to the general equations in the casewhere there are no specific deletion and substitution penalties defined.

Expressions which evaluate to infinity may be ignored in thecalculation. Therefore if there are few specific deletion andsubstitution penalties, this algorithm is still fairly efficient. For agiven sentence S_(x) all of the possible deletion and substitutionpenalties which may be relevant for a given word in S_(x) may becalculated once only for the sentence, regardless of which sentence itis to be compared with.

In addition to knowing the minimum distance between two sentences, theoptimal alignment between the sentences needs to be known so thatspecific penalties may be calculated for future use during aunsupervised DP match. This optimal alignment may be regarded as theroute through the matrix d(i,j) which leads to the optimal solution,d(L_(x)−1,L_(y)−1). A matrix t(i,j) of two-dimensional vectors isdefined which is used to find the optimal alignment. These vectors storethe value pair (k+1,l+1) for the value of k and l which caused theminimum solution to be found for d(i,j). k and l may have come fromO(i,j),P(i,j) or Q(i,j) depending upon which was the minimum solution.Thus the two components of t^(x)(i,j) and t^(y)(i,j) are defined as:t ^(x)(i,j)=1+argmin _(k)(d(i,j))t ^(y)(i,j)=1+argmin _(l)(d(i,j))

The traceback matrix t(i,j) may then be used to align the two sentencesS_(x)and S_(y) optimally against one another. Defining an iterator h, wecan recursively traceback from d(L_(x)−1,L_(y)−1) to discover a sequenceof co-ordinate pairs v^(x)(h) and v^(y)(h) of all points visited in theoptimal alignment.v ^(x)(0)=L _(x)−1

v ^(y)(0)=L _(y)−1 $\begin{matrix}{{v^{x}(h)} = {{v^{x}( {h - 1} )} - {t^{x}( {v^{x}( {h - 1} )} )}}} & {\quad{h > 0}} \\{{v^{y}(h)} = {{v^{y}( {h - 1} )} - {t^{y}( {v^{y}( {h - 1} )} )}}} & {\quad{h > 0}}\end{matrix}$

This traceback ends when v^(x)(h) and v^(y)(h) both equal zero. i.e. theorigin is reached. The value of h at this point is equal to the numberof alignment steps required to align the two sentences S_(x)and S_(y).This gives us a vector of traceback fragments for each sentence givenby: f_(x)(h) = w_(v^(x_((h))))^(x)…  w_(1 + v^(x_((h − 1))))^(x)f_(y)(h) = w_(v^(y_((h))))^(y)…  w_(1 + v^(y_((h − 1))))^(y)1 ≤ h ≤ h_(max)Discovered Substitutions and Deletions

The trace back vector can be used to discover the substitutions anddeletions which were necessary to match the two sentences. It is trivialto identify single word substitutions or deletions which were required,but it is advantageous to discover the largest possible sequences ofwords which were substituted or deleted. This is done by findingsequences of words in the aligned sequences which occur betweensubstitutions or deletions of zero cost. First of all we derive a vectorof cost differences for index h. $\begin{matrix}{{\delta(h)} = {{d( {v(h)} )} - {d( {v( {h + 1} )} )}}} & {\quad{0<=h<=h_{\max}}} \\{{\delta( h_{\max} )} = 0} & \quad\end{matrix}$

This vector has value zero for all substitutions or deletions which hadzero penalties (these will simply be matching words if there are nospecific penalties active) Maximum length adjacent sequences of non-zerovalues in the cost differences vector define the discovered specificpenalties (deletion penelties p( ) and substitution penalties q( )).

An example of the above algorithm in operation will now be described.Assume it is required to align the sentences (including the end ofsentence tokens) “# thankyou reverse the charges #” and “# reversecharge #”.

If A=7 and B=10 then

d(i,j)=

# thankyou reverse the charges # # 0.00 7.00 14.00 21.00 28.00 35.00reverse 7.00 10.00 7.00 14.00 21.00 28.00 charge 14.00 17.00 14.00 17.0024.00 31.00 # 21.00 24.00 21.00 24.00 27.00 24.00t(i,j)=

# thankyou reverse the charges # # 0,0 1,0 1,0 1,0 1,0 1,0 reverse 0,11,1 1,1 1,0 1,0 1,0 charge 0,1 1,1 0,1 1,1 1,1 1,1 # 0,1 1,1 0,1 1,1 1,11,1Alignment:

h 5 # # 0.0 4 thankyou 7.0 3 reverse reverse 7.0 2 the 14.0 1 chargescharge 24.0 0 # # 24.0Discovered Substitutions/Deletions

-   q(the charges,charge)=17.0-   p(thankyou)=7.0

Now it is possible to reduce these penalties and store them in the localknowledge database 34 for use in future alignment processes.

For example if

-   General Deletion Penalty A=7-   General Substitution Penalty B=10

Particular Substitutions:

-   -   q(charge, the charges)=0.0 (i.e. these phrases are synonymous)

Particular Deletions:

-   -   p(thankyou)=0.0 (i.e. thankyou is irrelevant to the meaning of        the phrase)        and it is required to align the sentences (including the end of        sentence tokens) “# thankyou reverse the charges #” and “#        reverse charge #” the matrices are now as follows:        Cost Matrix: d(i,j)

# thankyou reverse the charges # # 0.00 0.00 7.00 14.00 21.00 28.00reverse 7.00 7.00 0.00 7.00 14.00 21.00 charge 14.00 14.00 7.00 10.000.00 7.00 # 21.00 21.00 14.00 17.00 7.00 0.00Traceback Matrix: t(i,j)

# thankyou reverse the charges # # 0,0 1,0 1,0 1,0 1,0 1,0 reverse 0,11,0 1,1 1,0 1,0 1,0 charge 0,1 1,0 0,1 1,1 2,1 1,0 # 0,1 1,0 0,1 1,1 0,11,1Alignment:

h: 4 # # 0.0 3 thankyou — 0.0 2 reverse reverse 0.0 1 the charge 0.0charges 0 # # 0.0Discovered Substitutionsnone

Therefore the penalty for aligning the sentences is now 0.

The clustering algorithm used in this embodiment of the invention willnow be described with reference to FIG. 6 assuming that all thesentences have been aligned, as described above, with all othersentences in the database and the minimum distance between each sentenceand each other sentence has been recorded. At step 60 a sentence whichdoes not yet form part of a cluster is chosen randomly from the database34. At step 62 all other sentences which do not yet form part of acluster with a minimum distance less than a predetermined distance aredetermined. At step 64 the randomly chosen sentence and the sentencesdetermined at step 62 are placed into a cluster. At step 66, if nosentences were determined at step 62 then the randomly chosen sentenceis placed in a ‘cluster’ which is reserved for sentence which do notcluster with any others. At step 68 a check is made as to whether allthe sentences in the database 34 form part of a cluster, if so then theprocess terminates, otherwise steps 60-68 are repeated until all thesentences in the database form part of cluster. Each cluster may then beregarded as a discovered dialogue state.

Once the sentences in the training database have been clustered thereare a number of possible uses for the data. Each call in the corpus 38can be annotated according to the clusters (or discovered dialoguestates) of each operator utterance in the call. Known techniques canthen be used to generate a grammar, for example, a finite state networkof dialogue states, or a bigram or n-gram grammar, for use in naturallanguage automated call routing systems, for example.

If the corpus 38 is generated automatically it is also possible to usethe determined dialogue states to generate statistics for various typesof task being handled by the call centre. Statistics may be generated todetermine the number and types of calls being handled by the operators.

As will be understood by those skilled in the art, the imageclassification program 109 can be contained on various transmissionand/or storage mediums such as a floppy disc, CD-ROM, or magnetic tapeso that the program can be loaded onto one or more general purposecomputers or could be downloaded over a computer network using asuitable transmission medium.

Unless the context clearly requires otherwise, throughout thedescription and the claims, the words “comprise”, “comprising” and thelike are to be construed in an inclusive as opposed to an exclusive orexhaustive sense; that is to say, in the sense of “including, but notlimited to”.

1. A method of classifying a plurality of sequences of symbols to form aplurality of sets of sequences of symbols, the method comprising: a)determining a distance between each sequence and each other sequence insaid plurality of sequences in dependence upon a set of semanticallyinsignificant symbol sequences and a set of equivalent symbol sequencepairs; and b) grouping the plurality of sequences into a plurality ofsets in dependence upon said distances; wherein the symbols are wordstranscribed from call center operator speech signals spoken to a callerduring an inquiry by the caller to the call center.
 2. A methodaccording to claim 1 wherein the words are transcribed from call centeroperator speech signals using a speaker dependent speech recogniser. 3.A method of generating a set of semantically insignificant symbolsequences for use in the method of claim 1, comprising: classifying aplurality of sequences of symbols into a plurality of sets; for each ofthe sets, determining an optimal alignment between each sequence thereofand each other sequence in that set; and allocating a symbol or sequenceof symbols to the set of semantically insignificant symbol sequences,the symbol or sequence of symbols having been deleted to obtain anoptimal alignment between two sequences of a set.
 4. A method ofgenerating a set of equivalent symbol sequence pairs for use in themethod of claim 1, comprising: classifying a plurality of sequences ofsymbols into a plurality of sets; determining an optimal alignmentbetween each sequence in a set and each other sequence in that set; andallocating a pair of symbols or sequences of symbols to the set ofequivalent symbol sequences, the symbols or sequences of symbols havingbeen substituted for each other to obtain an optimal alignment betweentwo sequences of a set.
 5. A method of generating a grammar forinquiries made to a call center, using the plurality of sets ofsequences of words generated according to claim 1, comprising:transcribing a plurality of inquiries according to which of the sets thesequences of words in the enquiry occur; and generating a grammar independence upon the resulting transcription.
 6. A method of measuringthe occurrence of particular types of telephone inquiry received in acall center using the plurality of subsets of sequences of wordsgenerated according to claim
 1. 7. An apparatus for classifying aplurality of sequences of symbols to form a plurality of sets ofsequences of symbols, the symbols being words transcribed from callcenter operator speech signals spoken to a caller during an inquiry fromthe caller to the call center, the apparatus comprising: a store forstoring a set of semantically insignificant symbol sequences; a storefor storing a set of equivalent symbol sequence pairs; determining meansconnected to receive the transcribed call center operator speech signalsand further arranged to determine a distance between each sequence andeach other sequence in said plurality of sequences in dependence uponthe set of semantically insignificant symbol sequences and the set ofequivalent symbol sequence pairs; and means for grouping the pluralityof sequences into a plurality of sets in dependence upon said distances.8. An apparatus according to claim 7, further comprising a speakerdependent recogniser for transcribing call center operator speechsignals generated during the inquiry by the caller to the call center.9. An apparatus for generating a set of semantically insignificantsymbol sequences for use by the apparatus of claim 7, comprising: aclassifier for classifying a plurality of sequences of symbols into aplurality of sets; alignment means for determining an optimal alignmentfor each of the sets between each sequence thereof and each othersequence in that set; and means for allocating a symbol or sequence ofsymbols to the set of semantically insignificant symbol sequences, thesymbol or sequence of symbols having been deleted to obtain an optimalalignment between two sequences of a set.
 10. An apparatus forgenerating a set of equivalent symbol sequence pairs for use by theapparatus of claim 7, comprising: a classifier for classifying aplurality of sequences of symbols into a plurality of subsets; means fordetermining an optimal alignment between each sequence in a set and eachother sequence in set; and means for allocating a pair of symbols orsequences of symbols to the set of equivalent symbol sequences, thesymbols or sequences of symbols having been substituted for each otherto obtain an optimal alignment between two sequences of a set.
 11. Anapparatus for generating a grammar for inquiries made to a call centerby a caller comprising: a store for storing a plurality of sets ofsequences of words, the sequences having been classified into the setsby an apparatus according to claim 7; means for transcribing a pluralityof inquiries according to which of the sets of sequences of words in theinquiry occur; and means for generating a grammar in dependence upon theresulting transcription.
 12. A data carrier loadable into a computer andcarrying instructions for causing the computer to carry out the methodaccording to claim
 1. 13. A data carrier loadable into a computer andcarrying instructions for enabling the computer to provide the apparatusaccording to claim
 7. 14. A method of classifying a plurality ofsequences of words to form a plurality of sets of sequences of words,the method comprising: transcribing the plurality of sequences of wordsfrom call center operator speech signals spoken to a caller during aninquiry by the caller to the call center; determining a distance betweeneach sequence of words and each other sequence of words in saidplurality of sequences; and grouping the plurality of sequences of wordsinto a plurality of sets in dependence upon said distances.
 15. A methodaccording to claim 14, in which the words are transcribed from callcenter operator speech signals using a speaker dependent speechrecogniser.
 16. A method of generating a grammar for inquiries made by acaller to a call center, using the plurality of sets of sequences ofwords generated according to claim 14, comprising: transcribing aplurality of inquiries according to which of the sets the sequences ofwords in the inquiry occur; and generating a grammar in dependence uponthe resulting transcription.
 17. A method of measuring the occurrence ofparticular types of telephone inquiry received in a call center usingthe plurality of subsets of sequences of words generated according toclaim
 14. 18. An apparatus for classifying a plurality of sequences ofwords to form a plurality of sets of sequences of words, the apparatuscomprising: transcribing means for transcribing the plurality ofsequences of words from call center operator speech signals spoken to acaller during an inquiry by the caller to the call center; determiningmeans connected to receive the transcribed call center operator speechsignals and further arranged to determine a distance between eachsequence and each other semantically insignificant symbol sequences andthe set of equivalent symbol sequence pairs; and means for grouping theplurality of sequences into a plurality of sets in dependence upon saiddistances.
 19. An apparatus according to claim 18, wherein thetranscribing means further comprise a speaker dependent recogniser fortranscribing the call center operator speech signals generated duringthe inquiry from the caller to the call center.